Skip to content

wip: gtfs diff engine#1

Draft
cka-y wants to merge 1 commit intomainfrom
feat/1637
Draft

wip: gtfs diff engine#1
cka-y wants to merge 1 commit intomainfrom
feat/1637

Conversation

@cka-y
Copy link
Copy Markdown
Collaborator

@cka-y cka-y commented Apr 12, 2026

WIP - This feature hasn't been extensively tested

This pull request introduces the initial release of the GTFS Diff Engine, a memory-efficient Python library and CLI for comparing two GTFS feeds and producing a structured diff conforming to the GTFS Diff v2 schema. The changes include a robust implementation of the core diff logic, a clear public API, a command-line interface, detailed documentation, and supporting scripts for end-to-end usage.

The most important changes are:

Core Functionality and API:

  • Implements the core diff logic in engine.py, exposing a single diff_feeds() function that returns a typed Pydantic model representing the diff result. [1] [2]
  • Defines the GTFS file schema, supported files, and primary key columns in gtfs_definitions.py, with a helper for primary key lookup.

Command-Line Interface and Tooling:

  • Adds a Click-based CLI (gtfs-diff) in cli.py, supporting options for output file, row change cap, pretty-printing, and feed download timestamps.
  • Provides a Bash script compare_feeds.sh to automate downloading two GTFS feeds by URL and running the diff tool, with argument parsing and error handling.

Documentation and Examples:

  • Expands README.md with a comprehensive overview, installation instructions, usage examples, API reference, supported files table, output schema example, and implementation notes on memory efficiency.
  • Adds docs/architecture.md detailing design goals, module structure, streaming diff algorithm, edge case handling, and future improvements.

Packaging and Project Setup:

  • Configures packaging with pyproject.toml for installation, development, and test dependencies, and sets up the CLI entry point.
  • Adds package versioning and module entry points in __init__.py and __main__.py. [1] [2]

These changes collectively deliver a ready-to-use, well-documented GTFS diff engine suitable for both programmatic and CLI-based workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant